trajectory constraint
Curriculum Design for Trajectory-Constrained Agent: Compressing Chain-of-Thought Tokens in LLMs
Training agents to operate under strict constraints during deployment, such as limited resource budgets or stringent safety requirements, presents significant challenges, especially when these constraints render the task complex. In this work, we propose a curriculum learning strategy that gradually tightens constraints during training, enabling the agent to incrementally master the deployment requirements. Inspired by self-paced learning techniques in unconstrained reinforcement learning (RL), our approach facilitates a smoother transition to challenging environments by initially training on simplified versions of the constraints and progressively introducing the full deployment conditions. We provide a theoretical analysis using an RL agent in a binary-tree Markov Decision Process (MDP) to demonstrate that our curriculum strategy can accelerate training relative to a baseline approach that imposes the trajectory constraints from the outset.
Generalized Planning: Non-Deterministic Abstractions and Trajectory Constraints
Bonet, Blai, De Giacomo, Giuseppe, Geffner, Hector, Rubin, Sasha
We study the characterization and computation of general policies for families of problems that share a structure characterized by a common reduction into a single abstract problem. Policies $\mu$ that solve the abstract problem P have been shown to solve all problems Q that reduce to P provided that $\mu$ terminates in Q. In this work, we shed light on why this termination condition is needed and how it can be removed. The key observation is that the abstract problem P captures the common structure among the concrete problems Q that is local (Markovian) but misses common structure that is global. We show how such global structure can be captured by means of trajectory constraints that in many cases can be expressed as LTL formulas, thus reducing generalized planning to LTL synthesis. Moreover, for a broad class of problems that involve integer variables that can be increased or decreased, trajectory constraints can be compiled away, reducing generalized planning to fully observable non-deterministic planning.
Finding and Exploiting LTL Trajectory Constraints in Heuristic Search
Simon, Salomé (University of Basel) | Röger, Gabriele (University of Basel)
Temporal logics allow to formulate and reason about the development A unified formalism for these techniques would offer two of logic-based systems, for example about paths main advantages: decoupling the derivation and exploitation in factored state spaces. These are for instance common in of information and easily combining different sources planning, where temporal logics have always been present. of information. As one extreme, the entire planning task can be specified in a Currently the derivation and exploitation of information temporal logic language and plans are generated by theorem are integrated in most cases: someone proposes a new source proving (Koehler and Treinen 1995) or model construction of information and shows how it can correctly be exploited (Cerrito and Mayer 1998).
Reasoning on LTL on Finite Traces: Insensitivity to Infiniteness
Giacomo, Giuseppe De (Sapienza Università di Roma) | Masellis, Riccardo De (Sapienza Università di Roma) | Montali, Marco (Free University of Bozen-Bolzano)
In this paper we study when an LTL formula on finite traces (LTLf formula) is insensitive to infiniteness, that is, it can be correctly handled as a formula on infinite traces under the assumption that at a certain point the infinite trace starts repeating an end event forever, trivializing all other propositions to false. This intuition has been put forward and (wrongly) assumed to hold in general in the literature. We define a necessary and sufficient condition to characterize whether an LTLf formula is insensitive to infiniteness, which can be automatically checked by any LTL reasoner. Then, we show that typical LTLf specification patterns used in process and service modeling in CS, as well as trajectory constraints in Planning and transition-based LTLf specifications of action domains in KR, are indeed very often insensitive to infiniteness. This may help to explain why the assumption of interpreting LTL on finite and on infinite traces has been (wrongly) blurred. Possibly because of this blurring, virtually all literature detours to Buechi automata for constructing the NFA that accepts the traces satisfying an LTLf formula. As a further contribution, we give a simple direct algorithm for computing such NFA.